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In this paper we describe in detail the computational algorithm used by our parallel multigrid 
elliptic equation solver with adaptive mesh refinement. Our code uses truncation error estimates to 
adaptively refine the grid as part of the solution process. The presentation includes a discussion of 
the orders of accuracy that we use for prolongation and restriction operators to ensure second order 
accurate results and to minimize computational work. Code tests are presented that confirm the 
overall second order accuracy and demonstrate the savings in computational resources provided by 
adaptive mesh refinement. 
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I. INTRODUCTION 

Elliptic equations appear throughout engineering, sci- 
ence, and mathematics. Our primary interest is in cUip- 
tic problems that arise in the context of numerical rela- 
tivity. Currently, the field of numerical relativity is be- 
ing driven by rapid progress on the experimental front. 
There are several ground-based gravitational wave detec- 
tors in operation today, and their sensitivities are quickly 
approaching a level at which interesting science can be 
done. There are also plans for a space-based gravita- 
tional wave detector, LISA, to be launched around 2012. 
The scientific payoff of these instruments will depend 
largely on our ability to theoretically predict and ex- 
plain the observed signals. For both ground-based and 
space-based gravitational wave detectors, the most com- 
mon and strongest signals are expected to come from 
colliding black holes. Thus, much of the numerical rela- 
tivity community has directed its efforts toward modeling 
binary black hole systems. 

When black holes spiral together and collide, they gen- 
erate gravitational waves. The black hole "source" region 
has a length scale of GM/c^, where G is Newton's con- 
stant, M is the total mass of the two black holes, and c 
is the speed of light. The gravitational waves produced 
by the source have a length scale up to ~ 100 GM/c^. 
Herein lies one of the challenges of modeling binary black 
hole systems with finite difference methods. The source 
region requires grid zones of size <,0.01 GM/c^ to accu- 
rately capture the details of the black holes' interaction, 
while the extent of the grid needs to be several hundred 
GM/c? to accurately capture the details of the gravita- 
tional wave signal. Many research groups in numerical 
relativity are starting to use adaptive mesh refinement 
(AMR) techniques to deal with this discrepancy in length 
scales [1-11]- With AMR the grid resolution is allowed 
to vary across the computational domain so that com- 
putational resources can be concentrated where they are 
most needed. For the binary black hole problem, we need 
a high resolution region to cover the small-scale detail 
of the source, but the gravitational waves far from the 
source can be modeled with sufficient accuracy using a 
much lower resolution grid. 

Elliptic equations occur in several contexts in numeri- 
cal relativity. Einstein's theory of gravity is a system of 



partial differential equations consisting of four constraint 
equations and a set of evolution equations (sec for exam- 
ple Ref. The constraint equations restrict the data 
at each time step so in particular the initial data cannot 
be chosen freely. With suitable assumptions about the 
nature of the initial data, the constraint equations can 
be written as an elliptic system |l3j . 

Having solved the constraints for the initial data, those 
data arc evolved forward in time by the evolution equa- 
tions. At an analytical level, the evolution equations 
guarantee that the constraint equations will continue to 
be satisfied. However, in numerical modeling, numer- 
ical errors will introduce violations of the constraints. 
These violations can be disastrous because the evolu- 
tion equations admit unphysical, constraint violating so- 
lutions that grow exponentially [14-17]. One possible 
strategy for preventing this disaster is to impose the con- 
straints during the evolution, which means solving the 
elliptic constraint equations after each time step [18-20]. 

Elliptic equations also arise in numerical relativity 
when one is faced with choosing a coordinate system. 
In Einstein's theory the coordinate system must be cho- 
sen dynamically as the gravitational field evolves forward 
in time. The choice of coordinate system can have a 
dramatic effect on the performance of a numerical rel- 
ativity code. Researchers have developed many differ- 
ent strategies for choosing a coordinate system. Some of 
these strategies require the solution of elliptic, parabolic 
or hyperbolic equations, and some involve algebraic con- 
ditions. Some researchers feel that the elliptic conditions 
might be best, but the cost of solving elliptic equations at 
each time step has made the other choices more practical 
and more popular. 

In the numerical relativity community we need the ca- 
pability of solving elliptic equations quickly on adaptive, 
non-uniform grids. No doubt this same need exists in 
other areas of science and applied mathematics. 

Multigrid methods originated in the 1960's with the 
work of Fedorenko and Bakhvalov [2lll23 . l2^ . The y were 
further developed in the 1970's by Brandt |24 l25| . and 
are now the preferred methods for solving elliptic partial 
differential equations. The advantage of multigrid is its 
speed — multigrid algorithms only require order oper- 
ations to solve an elliptic equation, where A^'^ is the num- 
ber of grid points. In this paper we describe our code. 
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AMRMG, which solves nonlinear elliptic equations using 
multigrid methods with adaptive mesh refinement. The 
idea of combining multigrid with AMR is not new [21- 
24], although there are a number of features of our code 
that distinguish it from the discussions we have seen. In 
particular, AMRMG uses cell-centered data, as opposed 
to node centered data. AMRMG uses the Full Approx- 
imation Storage (FAS) algorithm, and therefore is not 
restricted to linear elliptic equations. AMRMG uses the 
Paramesh package to implement parallelization and to 
organize the multigrid structure pa . l29l | . In Ref . [s^l we 
used AMRMG to solve numerically for distorted black 
hole initial data. 

AMRMG is currently set up to solve second order 
equations that are semi-linear (the second order deriva- 
tive terms are linear in the unknown field). The FAS 
scheme is applicable for fully nonlinear equations as well, 
and in principle AMRMG can be modified to solve any 
nonlinear equation. The equations of current interest for 
us are semi-linear, therefore we have not tested AMRMG 
on any fully nonlinear systems. 

In this paper we describe the algorithm behind AM- 
RMG in detail. In Sec. El we present the overall concep- 
tual framework behind our code, and discuss some of the 
choices made in its development. Section Hill is devoted 
to a discussion of guard cell filling, which determines the 
coupling between fine and coarse grid regions. In Sec. II VI 
we review the FAS algorithm and in Sec. we discuss in 
detail the restriction and prolongation operators used by 
AMRMG. In Sec. I VII we describe the calculation of the 
relative truncation error and how it is used to control 
the grid structure. Section FVIII contains the results of a 
number of code tests involving the calculation of initial 
data for numerical relativity. 



II. MULTIGRID WITH AMR 

The simplest technique for solving an elliptic problem 
is relaxation. The equation (or system of equations) is 
written in discrete form as fii^cj)) = gi{(t>) where i labels 
the grid points and (f)i denotes the numerical solution. A 
"relaxation sweep" consists of refining the approximate 
solution (j)f'^ by solving the system f^{(l)^°^) = 3^(0°''^) 
for (f)^™ ■ [In the simplest case / is the identity and re- 
laxation is written as = gi{(j)°^'^)] The success of 
the relaxation method depends on how the finite differ- 
ence equations are split into a left-hand side /i((/)) and 
a right-hand side gi{(j>)- When relaxation does work, it 
is slow to converge. In particular the long wavelength 
features of the solution must slowly "diffuse" across the 
grid with successive relaxation sweeps. 

To solve a problem with multigrid methods we intro- 
duce a hierarchy of grids with different resolutions. For 
the moment, consider the case in which we seek the nu- 
merical solution of an elliptic equation on a uniform grid 
of size that covers the entire computational domain. 
We introduce grids of size (iV/2)^, (N/A)^, etc., each 



covering the computational domain. On each grid the 
finite difference equation, or an associated equation, is 
solved by relaxation. The equations to be solved on each 
multigrid level are discussed in Sec. IIVI For now, we 
simply note that the equations are chosen so that re- 
laxation on the coarse grids quickly captures the long 
wavelength features of the solution. Relaxation on the 
fine grids captures the short wavelength features. The 
grids in the multigrid hierarchy communicate with one 
another through restriction and prolongation operators. 
Restriction takes data on a grid in the hierarchy and re- 
stricts it to the next coarsest grid. Prolongation takes 
data on a grid in the hierarchy and interpolates it onto 
the next finest grid. Different multigrid algorithms use 
different sequences of grids in solving elliptic problems, 
but the most basic sequence is the V-cycle. In a multi- 
grid V-cycle one starts with the finest grid, steps down 
the grid hierarchy to the coarsest grid, then steps back 
up to the finest grid. 

In the context of a time-dependent problem, adaptive 
mesh refinement (AMR) means that the grid structure 
adapts in time to meet the changing demands as the fields 
evolve. In the context of an elliptic (time-independent) 
problem, AMR means that the grid structure is deter- 
mined adaptively, as part of the solution process, in an 
attempt to minimize numerical errors. 

We use the Paramesh package to organize the grid 
structure for our code. Paramesh covers the computa- 
tional domain with blocks of data of varying spatial res- 
olution. These blocks form a tree data-structure. They 
are logically Cartesian, consisting of a fixed number of 
cells. We typically use 8^ cells for each block. Figure 1 
shows an example one-dimensional grid. The numbers in 
that figure indicate the resolution level, and the letters 
denote blocks. At the base of the tree structure is a single 
block, labeled 1 A. Paramesh refines blocks by bisection in 
each coordinate direction. In this one-dimensional exam- 
ple block lA is refined into two blocks. 2A and 2B. Since 
each data block contains the same number of cells, level 
2 has twice the resolution as level 1. Using Paramesh 
terminology, block lA is the "parent" of blocks 2A and 
2B, and blocks 2 A and 2B are the "children" of block 
lA . In Fig. 1 Paramesh has also refined blocks 2A and 
2B to create blocks 3A, 3B, 3C and 3D. Further refine- 
ments yield the non-uniform grid shown in the figure. 
Paramesh always creates grid structures in which adja- 
cent blocks' refinement levels differ by no more than one. 

Our first task is to decide how to carry out a ba- 
sic multigrid V-cycle on a non-uniform grid structure. 
There are two natural approaches. The first approach is 
the Fast Adaptive C omp osite Grid Method (FAC) devel- 
oped by McCormick[3lj. In this approach the relaxation 
sweeps extend across the entire computational domain, 
and one defines the succession of multigrid levels by re- 
stricting the highest resolution subgrid to the next lower 
resolution. As an example based on Fig. 1, we let the top 
multigrid level consist of blocks 4A-5A-5B-5C-5D-4D-3C- 
4E-4F. After carrying out a series of relaxation sweeps 
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FIG. 1: Example of a one-dimensional grid structure. The numbers on the left denote the resolution level, and the letters label 
blocks of data. 



on this non-uniform grid, we step down the multigrid V- 
cycle by restricting the data in blocks 5A-5B-5C-5D to 
resolution level 4. Thus, the next multigrid level is de- 
fined by blocks 4A-4B-4C-4D-3C-4E-4F. After relaxing 
on this grid, we restrict the resolution level 4 blocks to 
resolution level 3. This defines the next multigrid level as 
3A-3B-3C-3D. We can continue in this fashion to define a 
complete hierarchy of multigrid levels, each covering the 
entire computational domain. 

The second approach, the one we use for AMRMG, 
is to define the grids in the multigrid hierarchy to coin- 
cide with the different resolution levels. This is the origi- 
nal multi-level adaptive technique (MLAT) proposed by 
Brandt [2^^2|- an example based on Fig. 1, the top 
multigrid level consists of all blocks at resolution level 
5, namely, 5A-5B-5C-5D. After carrying out a series of 
relaxation sweeps on the level 5 blocks, we restrict that 
data to level 4. Then the next multigrid level consists 
of blocks 4A-4B-4C-4D-4E-4F. After relaxing on these 
blocks we restrict the resolution level 4 data to resolu- 
tion level 3. Then the next multigrid level consists of the 
level 3 blocks 3A-3B-3C-3D. We continue in this fashion 
to define a complete multigrid hierarchy. 

We have built and tested a one-dimensional multi- 
grid code based on the FAC approach. That code works 
quite well. However, the MLAT approach appeared to 
us to be more straightforward to implement in a three- 
dimensional code based on Paramesh. For this reason 
AMRMG defines the levels in the multigrid hierarchy by 
resolution. Apart from the issue of implementation, the 
MLAT approach has an advantage in solving problems 
in which only a small region of the computational do- 
main requires high resolution. With the FAC approach, 
in which relaxation always extends across the entire com- 
putational domain, a lot of unnecessary computational 
effort can be expended on relaxation in the low resolution 
regions. On the other hand, the FAC approach has the 
advantage over MLAT in maintaining a tighter coupling 
between regions of different resolutions. For example, for 



the grid shown in Fig. 1, the data in blocks 4A and 4D 
effectively provide boundary conditions for relaxation in 
blocks 5A through 5D. With FAC, that boundary infor- 
mation is updated between every relaxation sweep across 
the top multigrid level (4A-5A-5B-5C-5D-4D-3C-4E-4F). 
With MLAT, in which the top multigrid level consists of 
blocks 5A-5B-5C-5D, the boundary information is only 
updated once each V-cycle. 



III. GUARD CELL FILLING 

AMRMG uses cell centered data. When we apply the 
relaxation formula fi{(t>^™) = gi{(j)°^'^) to a cell adja- 
cent to a block face, the finite difference stencil extends 
beyond the block. Paramesh uses layers of guard cells 
surrounding each block to hold data from beyond the 
block boundaries. In Fig. 2 we show a portion of a one- 
dimensional grid with a fine grid block on the left and 
a coarse grid block on the right. The grid points are la- 
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FIG. 2: A portion of a one-dimensional grid, showing three 
cells from a fine grid block on the left and two cells from a 
coarse grid block on the right. 

belcd by their distance from the block interface in units 
of the fine grid cell size Ax. Thus, the fine grid points 
are —1/2, —3/2, etc. and the coarse grid points are 1, 3, 
etc. The gray circle at location -fl/2 is a guard cell for 
the fine grid block, and the gray square at location —1 is 
a guard cell for the coarse grid block. Guard cell values 
are obtained by interpolation from surrounding interior 
data points. We want to consider how errors in guard 
cell filling affect the accuracy of the solution. 
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Consider the simple example of the Poisson equation 
in one dimension, d'^4)/dx'^ = p. For the moment let 
us consider a uniform numerical grid with grid spacing 
Ax. With standard second order centered differencing, 
the discrete Poisson equation is 

^^±1^|^^±^ + 0(A..^)=P.. (1) 

where i labels the grid points. The term O(Ax^) is the 
truncation error obtained from discretization of the sec- 
ond derivative. We can rewrite Eq. in a form appro- 
priate for relaxation as 

0"™ = li't^f+i + ^A^-V. + 0{Ax^) . (2) 

When we apply this relaxation formula, the truncation 
error dictates that the numerical solution will have errors 
of order Ax^. That is, the numerical solution will be 
second order accurate. 

For the non-uniform grid of Fig. 2, the discrete equa- 
tion Q and the relaxation formula (|2Jl apply as shown 
in the fine grid region, where i = —1/2, —3/2, etc. For 
relaxation at the grid point i = —1/2, we need the guard 
cell value (/)i/2- We want the guard cell value to be suf- 
ficiently accurate that it does not spoil the second order 
convergence of the solution. It is clear from these equa- 
tions that errors of order Aa;^ in the value of 4>i/2 can 
be absorbed into the truncation error already present. 
Thus, we expect the numerical solution to be second or- 
der convergent if the guard cells are filled to fourth (or 
higher) order accuracy. 

As far as we know, fourth order guard cell filling is 
sufficient to produce a second order accurate solution for 
second order partial differential equations discretized on 
a non-uniform grid with standard second order differ- 
encing. Fourth order guard cell filling, however, is not 
a necessary condition. There is a "rule of thumb" in 
the computational mathematics community that can be 
summarized as follows ,26„ ,33| : Errors of order Ax^ 
that occur on a subspace of dimension m in a space of 
dimension n will often contribute to the solution like er- 
rors of order Aa;P+"~'" from the bulk. Thus, we antici- 
pate that errors of order Aa;^ in guard cell filling, which 
occur on the two-dimensional block boundaries in the 
three-dimensional space, will contribute like errors of or- 
der Ax* from the bulk and will not spoil the second order 
convergence of our code. For the problems that we have 
studied this is indeed the case. 

The guard cell filling scheme that we use was written 
by Kevin Olson as part of the standard Paramcsh pack- 
age. The process of filling the guard cells of a fine grid 
block that is adjacent to a coarse grid block proceeds 
in two steps. The first step is a restriction operation in 
which cells from the interior of the fine grid block are 
used to fill the interior cells of the underlying "parent" 
block. The restriction operation is depicted for the case 
of two spatial dimensions in the left panel of Fig. 13 The 
restriction proceeds as a succession of one-dimensional 



quadratic interpolations, and is accurate to third order 
in the grid spacing. Note that the fine grid stencil used 
for this step (nine black circles in the figure) cannot be 
centered on the parent cell (gray square) . In each dimen- 
sion the stencil includes two fine grid cells on one side of 
the parent cell and one fine grid cell on the other. The 
stencil is always positioned so that its center is shifted 
toward the center of the block (assumed in the figure to 
be toward the upper left). This ensures that only interior 
fine grid points, and no fine grid guard cells, are used in 
this first step. 

For the second step, the fine grid guard cells are filled 
by prolongation from the parent grid. Before the prolon- 
gation, the parent block gets its own guard cells (black 
squares in the right panel of Fig.|SJl from the neighboring 
block at the same refinement level. The stencil used in 
the prolongation operation is shown in the right panel of 
Fig. O The prolongation operation proceeds as a succes- 
sion of one-dimensional quadratic interpolations, and is 
third order accurate. In this case, the parent grid stencil 
includes a layer of guard cells (black squares), as well as 
its own interior grid points (gray squares). At the end 
of this second step the fine grid guard cells arc filled to 
third order accuracy. 

When Paramesh fills the guard cells of a parent block, 
it also fills the guard cells of the parent's neighbor at the 
same refinement level. In Fig. 3, the parent's neighbor 
is the coarse grid block on the right side of the refine- 
ment boundary. Since we arc using the second approach 
to multigrid outlined in Sec. ^ we do not relax in the 
parent's neighbor block until we step down the multigrid 
hierarchy. Thus, the guard cell values assigned to a par- 
ent's neighbor by the Paramesh guard cell filling routine 
are not used by AMRMG. 



IV. THE FAS ALGORITHM 

AMRMG uses the Full Approximation Storage (FAS) 
multigrid algorithm [2^ . In this section we provide a 
brief overview of FAS, and discuss the order of restriction 
and prolongation operators used for stepping down and 
up the multigrid hierarchy. 

We are interested in the nonlinear elliptic differential 
equation E{-) = p, where i? is a (possibly) nonlinear 
elliptic operator. To avoid confusion with notation for 
exact and approximate solutions, we use a centered "dot" 
as a placeholder for the unknown function. Now consider 
a simple multigrid hierarchy consisting of two grids, a 
fine grid at level 2 and a coarse grid at level 1. The 
differential equation E(-) = p, discretized on the finest 
level, becomes i?2(-) = P2- Here, the subscripts denote 
the multigrid level. What we seek is a solution of the 
difference equations E2{-) = P2- 

The basic V-cycle for the FAS algorithm consists of the 
following steps. 

Step 1. Guess a trial solution (j)2 (for example, 4>2 ~ 0) 
and carry out some number of relaxation sweeps on the 
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FIG. 3: The picture on the left shows the first step in guard cell filling, in which one of the parent grid cells (gray square) is 
filled using quadratic interpolation across nine interior fine grid cells (black circles). The other parent grid cells are filled using 
corresponding stencils of nine interior fine grid cells. The picture on the right shows the second step in which two fine grid 
guard cells (gray circles) are filled using quadratic interpolation across nine parent grid values (squares). These parent grid 
values include one layer of guard cells (black squares) obtained from the parent's neighbor on the right side of the interface, 
and two layers of interior cells (gray squares). 



equation i?2(-) = P2 to obtain an approximate solution 

4>2- 

Step 2. Construct the coarse grid source 

Pl^n{p2-E2{(j)2))+E^{Tl<j)2) . (3) 

Here TZ denotes the restriction of data from multigrid 
level 2 to level 1. Also, Ei denotes the discretization 
of the elliptic operator E on the coarse grid 1. Loosely 
speaking, the term —TZ{E2{4'2)) removes the predomi- 
nantly short wavelength part of the source that the fine 
grid has already captured in Step 1. The term Ei{TZ(j)2) 
returns the long wavelength part of the source that was 
removed by subtracting TZ{E2{4>2))- 
Step 3. Start with the trial solution (pi = 'R-4'2 and carry 
out some number of relaxation sweeps on the equation 
Ei{-) = pi to obtain an approximate solution 01. Alter- 
natively, if possible, solve £'i(-) = pi exactly for 0i. 
Step 4. Construct the trial solution 

02 =(/.2+7'(-7^02 . (4) 

Here V denotes the prolongation of data from multi- 
grid level 1 to level 2. Loosely speaking, the term 
7^(— 7?,02 + 0i) removes the long wavelength part of <f>2 
and replaces it with 0i, which contains primarily long 
wavelength information due to the construction of pi . 
Step 5. Start with the trial solution 02 from Step 4 and 
carry out some number of relaxation sweeps on the equa- 
tion E2{-) = P2 to obtain an improved approximate solu- 
tion 02- 

For successive V-cycles the approximate solution 02 
from Step 5 is used as the trial solution 02 in Step 1. 
The FAS V-cycle can be generalized in an obvious way 
to any number of multigrid levels. 



At the bottom of each V-cycle Step 3 instructs us to 
find an exact or approximate solution of the equation 
Ei{-) = pi- The subscript '1' denotes the coarsest level 
in the multigrid hierarchy. It is important for the per- 
formance of any multigrid code to solve this equation 
accurately. Solving the level 1 equation can be a po- 
tential bottleneck for our code because Paramesh places 
the data for each block on a single processor. When the 
algorithm is at the bottom of a V-cycle, only a single 
processor is active. 

The simplest strategy for solving the level 1 equation 
^'i(') = Pi is to carry out relaxation sweeps, just as we 
do for the higher multigrid levels. Typically our coarsest 
multigrid level is a single data block with 8'^ interior grid 
points. We find that with Robin boundary conditions 
it typically takes about one hundred relaxation sweeps 
to solve the level 1 equation to sufficiently high accuracy. 
With fewer sweeps at this level the code can require more 
V-cycles to solve the elliptic problem. 

The concern is that the solution of the level 1 equation, 
requiring ^ 100 sweeps with only one processor active, 
can dominate the run time for the code. However, it 
turns out that the run time for AMRMG is dominated 
by communications calls made in Paramesh. Currently 
AMRMG uses version 3.0 of Paramesh, which is one of 
the first versions of Paramesh to run under MPI. More 
recent versions are better optimized, but we have not 
switched to the latest version of Paramesh because of 
the special modifications that AMRMG requires. 

One way that we have improved the performance of 
our code is to bypass the communications calls made by 
Paramesh when solving the level 1 equation. The data 
for the level 1 equation always resides on block #1 on 
processor #1, so no communication among processors is 
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needed. We have bypassed the Paramesh guard cell filling 
routine by writing a routine that directly fills the guard 
cells of this block using the outer boundary conditions. 

We are primarily interested in solving elliptic prob- 
lems that are semi-linear, that is, problems in which the 
second order derivative terms are linear in the unknown 
field. For these problems the equation to be solved at the 
bottom of each V-cycle takes the form Ai0i = pi, where 
Ai is the Laplacian operator (not necessarily on flat 
space with Cartesian coordinates) discretizcd on multi- 
grid level 1. In these cases we have an alternative to relax- 
ation, namely, direct matrix inversion: (pi = (Ai)~^pi. 
We have implemented matrix inversion for level 1 us- 
ing the direct Gaussian elimination routine from the LA- 
PACK libraries [s^l- With Robin boundary conditions, 
we must solve the level 1 equation for (jj values in the 
guard cells as well as interior cells. With 8"^ interior grid 
points and one layer of guard cells, we have 10^ values 
to determine at the bottom of each V-cycle. Therefore 
the matrix to be inverted has dimensions 1000 x 1000. 
Our tests show that it takes longer (by a factor of ^ 10) 
to solve the level 1 equation by direct matrix inversion 
than by relaxation, assuming the Paramesh communica- 
tions calls have been bypassed. However, in either case 
the time required to solve the level 1 equation is a small 
fraction of the overall runtime for the code. Thus, we 
prefer to use the direct matrix inversion whenever possi- 
ble because, with matrix inversion, the accuracy of the 
level 1 solution is insured. 



V. RESTRICTION AND PROLONGATION 

As described in Sec. IIIII guard cells are filled with a 
combination of restriction and prolongation operations. 
The operators used for guard cell filling are third order 
accurate, and we denote these by ^^^TZ for restriction and 
^^^V for prolongation. What order restriction and prolon- 
gation operators do we use for stepping down and up the 
multigrid hierarchy? The answer is that we use a combi- 
nation of second, third, and fourth order operators. 

The restriction operators in Paramesh are always de- 
fined in such a way that only interior cells from the child 
blocks are used to fill the interior cells of a parent block. 
The fine grid stencil is positioned to keep the coarse grid 
point as close as possible to the center of the stencil. For 
the case of second order restriction, ^^''TZ, the coarse grid 
point lies at the center of the stencil and gets its value 
from a succession of linear interpolations in each dimen- 
sion. The case of third order restriction, ''''T?., is depicted 
in the left panel of Fig. O and is described in Sec. Illll 

The second, third, and fourth order prolongation op- 
erators in Paramesh use a succession of (respectively) 
linear, quadratic, and cubic interpolations in each dimen- 
sion to fill the fine grid cells. The prolongation operators 
use both interior cells and guard cells from a parent block 
to fill both interior and guard cells of child blocks. The 
right panel of Fig. O shows the stencil used by the third 



order prolongation operator, ("^^P, to fill fine grid guard 
cells on the right side of a fine grid block. This same sten- 
cil is used to fill the first layer of interior cells (the layer 
of interior fine grid cells adjacent to the block boundary). 
For the second and third layer of interior fine grid cells, 
the stencil is shifted to the left by one coarse grid point. 
This pattern of stencil shifting continues across the right 
half of the fine grid block until the midpoint of the block 
is reached. The stencils used for the left half of the fine 
grid block are the mirror images of those used for the 
right half. 

The version of Paramesh currently used by AMRMG 
allows for second and third order restriction, and (in prin- 
ciple) arbitrary order prolongation. We have carried out 
many numerical tests to help us choose among different 
combinations of restriction and prolongation operators 
for stepping down and up the V-cycles. For all of these 
tests we used third order restriction and third order pro- 
longation to fill the fine grid guard cells, as described in 
Sec. IIIII In our tests we did not consider prolongation 
orders higher than four. 

As we have presented it, the FAS algorithm uses two 
restriction operators in Step 2, one restriction operator 
in Step 3, and one restriction operator in Step 4. It uses 
one prolongation operator in Step 4. One could con- 
sider distributing the (assumed linear) restriction opera- 
tor through the first term in Eq. |(2J) and treating the op- 
erators independently. Likewise, one could consider dis- 
tributing prolongation operator through the second term 
in Eq. and treating the operators independently. We 
have not considered the consequences of splitting these 
terms. Moreover, AMRMG is written such that the cal- 
culation Tl<f>2 from Step 2 is used as the trial solution 
01 = 7^02 for Step 3. Thus, we have not tested the 
consequences of treating these restriction operators in- 
dependently. Note that the restriction of the fine grid 
solution, TZ(t)2, appears in Step 4 as well as Steps 2 and 
3. Our tests show that the order of the restriction oper- 
ator in Step 4 must agree with the order used for TZ(f)2 
in Steps 2 and 3. If not, the algorithm will often fail to 
converge in the sense that the residual (defined below) 
will not decrease with successive V-cycles. 

The options that remain for restriction and prolonga- 
tion operators can be expressed by rewriting Eqs. ^ and 

m- 

Pi = ('')7^(p2-^2(02)) + ^l(^"^7^(/.2) , (5a) 

02 = 02 +^''^(-^''^7^02+0l) . (5b) 

The letters a, b, and c represent the orders of restriction 
and prolongation operators that appear in stepping down 
and up the V-cycles. 

We want to find values for a, 5, and c that will give 
the best performance. In judging the performance of our 
code we are looking to see how quickly the residual de- 
creases with successive V-cycles for a fixed non-uniform 
mesh. The residual at each point in the computational 
domain is defined by res = Pn — ^'n(0n), where n is 
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the highest refinement level at that point and </>„ is the 
approximate solution. The norm of the residual is com- 
puted as 



(6) 



The sum extends over the grid points that cover the com- 
putational domain at the highest refinement level. (In 
Fig. (2) these would be the interior points of blocks 4A- 
5A-5B-5C-5D-4D-3C-4E-4F.) The number N is the to- 
tal number of such grid points. The norm (res) defined 
above is similar to the usual L2 norm, but lacks a factor 
of the cell volume in the "measure" of the sum. That is, 
the usual L2 norm would be written as ^yl^vres^)/V 
where v is the volume of each grid cell and V is the to- 
tal volume of the computational domain. By omitting 
the factors of cell volume, the norm (res) gives equal 
weighting to the residuals in each grid cell, regardless of 
resolution. 

To be specific, we will quote the results for the sim- 
ple test problem A0 = p, where A is the fiat space 
Laplacian in Cartesian coordinates. We use the source 
/9 = (6 — 9r^) exp(— r'^) with r = ^ x'^ + y'^ + z"^ . At the 
boundaries we use the Robin condition ^[?'(0 — 1)] = 0. 
The analytic solution to this problem is (j> = 1 + (1 — 
exp(— r'^))/r. The numerical solution is computed with a 
fixed three-level "box-in-box" grid structure. The high- 
est resolution region has cell size Aa; = 0.125 and covers 
a cubical domain with x, y, and z ranging from —2 to 2. 
The medium resolution region has cell size Ax — 0.25. 
It covers the domain from —4 to 4 that is exterior to the 
high resolution region. The low resolution region has cell 
size Ax = 0.5. It covers the domain from —8 to 8 that 
is exterior to the medium resolution region. It has been 
our experience that the generic behavior of AMRMG is 
fairly well represented by this simple test case. 

Our first observation is that with a = 2 the residual 
gets "stuck" after a few V-cyclcs. The norm (res) drops 
to about 10^*, but no further. This happens regardless 
of the values chosen for b and c. In the limit of high 
resolution the truncation error is less than the residual, 
and the code fails to show second order convergence of 
the solution. Thus, we can eliminate the cases in which 
a = 2 and focus on a = 3. 

For a = 3 the norm of the residual decreases with suc- 
cessive V-cycles to values well below the truncation error. 
Table HI shows the average change in the common loga- 
rithm of (res) for each V-cycle, as a function of the num- 
ber of relaxation sweeps at each multigrid level. (This 
excludes the first multigrid level, at the bottom of each 
V-cycle, where we compute the exact solution using ma- 
trix inversion.) The best performance is obtained with 
(a, b, c) = (3, 2, 4). Note that the norm of the residual be- 
comes insensitive to the number of relaxation sweeps as 
the number of sweeps increases beyond four or five. This 
is because, as observed in Sec. m the higher multigrid 
levels that have lower resolution neighbors can only re- 
ceive updated boundary information once each V-cycle. 



number 

of 
sweeps 


order of restriction and 
prolongation operators (a,6,c) 


(3,2,2) 


(3,2,3) 


(3,2,4) 


(3,3,2) 


(3,3,3) 


(3,3,4) 


1 


-0.00 


-0.11 


-0.16 


-0.11 


-0.13 


-0.33 


2 


-0.32 


-0.42 


-0.50 


-0.33 


-0.46 


-0.56 


3 


-0.56 


-0.75 


-0.77 


-0.43 


-0.56 


-0.69 


4 


-0.77 


-0.88 


-1.13 


-0.50 


-0.63 


-0.74 


5 


-0.85 


-1.08 


-1.21 


-0.56 


-0.68 


-0.79 


6 


-0.87 


-1.15 


-1.23 


-0.62 


-0.74 


-0.83 



TABLE I: Average change in log( (res)) per V-cycle. 



It does not help to continue relaxation sweeps when the 
boundary information is "old" and needs to be updated. 
We typically use four relaxation sweeps, with red-black 
Gauss-Seidel ordering 

The results of our testing lead to the following formulas 
for Steps 2 and 4 of the FAS algorithm: 



Pi = ^'^7^(p2-S2(02))^ 

02 = 02 + ^^^7'(-(3'7^02 



Sl(^'^7^02) 
-0i) • 



(7a) 
(7b) 



In Step 3 we use the trial solution (pi 



(3) 



7^02■ Re- 



call that we have not tested the algorithm with order of 
restriction greater than 3, or with order of prolongation 
greater than 4. 

Conventional wisdom for determining the orders of re- 
striction and prolongation used for multigrid transfer op- 
erations is that the following should be satisfied: 



C'7^ + Op > d 



(8) 



Here, O-jz, O-p, and O-d are the orders of restriction, pro- 
longation, and the differential operator, respectively |3fil |. 
For a uniform grid, AMRMG acts as a typical FAS multi- 
grid solver and we achieve acceptable convergence rates 
as long as the transfer operators satisfy Eq. 0. With 
a nonuniform grid, the restriction operator denoted '•"■'T?. 
in Eqs. (jSJ must be third or higher order, at least in the 
vicinity of mesh refinement boundaries, for the code to 
achieve both second order accuracy and optimal conver- 
gence rates. This is not surprising since, with the MLAT 
approach, data that is restricted from a high resolution 
region (for example, data restricted from block 5D to 
block 4C in Fig. 1) can serve as boundary data for relax- 
ation in a coarse grid region (block 4D in Fig. 1). With 
restriction order less than 3, such boundary data yield 
truncation errors greater than 0(Ax) when they appear 
in a discrete second derivative. One of our goals for AM- 
RMG is to achieve second order accuracy and good con- 
vergence rates with minimal modification of the existing 
Paramesh framework. For this reason we have not ex- 
plored the possibility of using modified finite difference 
stencils or modified transfer operators in the vicinity of 
mesh refinement boundaries. 
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VI. TRUNCATION ERROR AND GRID 
CONTROL 

AMRMG adapts the grid structure to the problem at 
hand in an attempt to keep the local truncation error un- 
der control. The local truncation error is defined across 
the computational domain on the grid that consists of 
the highest resolution blocks. In Fig. (2) these would 
be blocks 4A-5A-5B-5C-5D-4D-3C-4E-4F. Let us refer to 
this non-uniform grid as grid h. Then the local trunca- 
tion error is defined by 25] 

^ EM) - (EmiH , (9) 

where (jj is the exact solution of the continuum equation 
— p, and (f)\h is the projection of (f) onto grid h. The 
discretization of the operator E on grid h is denoted by 
Ef, . In a similar manner we define the local truncation 
error th on a grid H that is constructed from the par- 
ents of grid h blocks. Grid H covers the computational 
domain with half the resolution of grid h. The difference 
between the truncation errors on grids H and h is 

TH-TlTh = Eh{(I)\h) - iEi<l)))\H 

^nEh{q}\h) + n{Ei(j)))\h , (10) 

where 7^ is a linear operator that restricts data from h 
to H. Let us assume that TZ is third order accurate in 
the grid spacing. Then the second and fourth terms in 
Eq. H10() cancel to third order and, to this same order of 
accuracy, we find th - TZrh « EH{(f>\H) - 

The relative local truncation error is defined on grid H 
by M 

^EniTZM-nEhiM) , (11) 

where 4>h is the approximate numerical solution from grid 
h. Again we assume that the restriction operator is accu- 
rate to third order. Since the approximate solution coin- 
cides with the exact solution to leading order. (f>h ~ 
we see from Eqs. (|10|) and Hll() that to leading order in 
the grid spacing the relative truncation error is related 
to the local truncation errors by k, th — T^Th- Since 
we use second order differencing for our elliptic problems 
the truncation errors are proportional to the square of the 
grid spacing. Then t// « 4 TZt^ and the relative trunca- 
tion error is given by « 3TZt}i. This relation can be 
prolonged to the finest grid h, giving Vt^ w SVTZtii- 
The prolongation operator V, like the restriction opera- 
tor TZ, is assumed to be third order accurate in the grid 
spacing. Then to third order accuracy VTZ is the identity 
operator on h, and we have 

rn « JPrf . (12) 
Together, Eqs. 1^1} and give 

Th'=^l{^'^VEHi^'^n^h)-Ehi4>h)) ■ (13) 
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FIG. 4: Analytic truncation error and calculated truncation 
error along the 2-axis. 



In AMRMG, we use this approximation of the local trun- 
cation error to monitor the errors and control the grid 
structure. Note that since the truncation error r/j is pro- 
portional to the square of the grid spacing, the result 
(|13|l is valid to leading order only if third (or higher) 
order restriction and prolongation operators are used. 

For the test problem described in Section it is 
straightforward to calculate the analytic truncation error. 
Figure 21 shows a comparison of the analytic truncation 
error with the computed approximation to the truncation 
error (|13|l for that test problem. The analytic truncation 
error is shown as a thick solid line, while the calculated 
truncation error is a thin line with filled circles. 

In practice we start the solution process by specifying 
a rather coarse grid structure, sometimes uniform some- 
times not. The code V-cycles until the norm of the resid- 
ual across grid h is less than the norm of the local trunca- 
tion error across grid /i. We write this as {res)\h < {Th)\h, 
where the norm of the truncation error is defined in the 
same way as the norm of the residual, Eq. ©. The code 
usually requires two or three V-cycles to meet this crite- 
rion. We then compute the norm of the truncation error 
{'''h)\b for each block of grid h. Any block whose norm 
is greater than some threshold value, (T^)|t > Tmax, is 
fiagged for refinement. Paramesh rebuilds the grid struc- 
ture and redistributes the data across processors. To 
obtain a trial solution in the newly formed blocks we 
prolong the solution from the parent blocks. The code 
then carries out V-cycles on this new grid structure with 
its new highest-resolution grid h. The entire process re- 
peats until all blocks satisfy {Tfi)\b < T„^ax and no blocks 
arc flagged for refinement. At this point the code contin- 
ues to V-cycle until two conditions are satisfied: (1) the 
norm of the residual in each block is less than the norm of 
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FIG. 5: Errors along the z-axis for AMR grid and uniform 
grids of various resolutions. 

the truncation error. {res)\i, < {Th)\b', and (2) the norm 
of the residual across the entire grid is less than some 
threshold value, (res)\h < fesmax- If only the first con- 
dition is desired, we simply set resmax to a very large 
value. 

We have tested the code using second, third, and fourth 
order prolongation operators for the calculation of trial 
solutions in newly formed blocks. We find that none 
of these operators is consistently better than the others. 
The entire adaptive mesh, multigrid algorithm is not very 
sensitive to the order of prolongation used in this step. 
We typically use the fourth order operator '"''P. 

The grid control scheme used by AMRMG works well. 
It insures that the truncation error in each block across 
the computational domain is uniformly low, less than 
Tmax, and that the errors coming from the residuals in 
each block are less than the truncation errors. The value 
chosen for Tmax depends on the problem being solved and 
the desired degree of accuracy. 



VII. CODE TESTS 

In this section we present code tests to demonstrate 
second order convergence and the computational advan- 
tages of AMR. Figure O shows a comparison of errors for 
the test case described in Sec. The analytical errors 
on the z-axis are plotted for an AMR grid and for uni- 
form grids with resolutions at levels 4, 5, 6, and 7. (A 
grid with resolution level X is created by refining a single 
block X — 1 times. See Fig. 1.) For the AMR run we 
start with a uniform level 3 grid and set the refinement 
criterion for a maximum truncation error of Tmax = 0.001 
in each block. We also limited the highest resolution to 



FIG. 6: AMR grid structure in the x-y plane. Each square 
corresponds to a block of data containing 8^ computational 
cells. 

level 7, so the truncation errors in some of the level 7 
blocks reached as high as 0.007. The cell size for resolu- 
tion level 4 is Ax = 0.25, and the cell size for resolution 
level 7 is Aa; = 0.03125. The grid structure chosen by 
AMRMG for the AMR run is shown in Fig. 

Observe that the errors in each region of the AMR grid 
are comparable to the errors obtained with a uniform grid 
of the same resolution. For example, the errors in the 
level 7 region of the AMR solution (the region surround- 
ing the origin) are slightly larger than the errors obtained 
from the uniform level 7 solution, and smaller than the 
errors obtained from the uniform level 6 solution. We also 
note that the savings in memory with AMR is profound; 
the AMR grid solution can be calculated on a single pro- 
cessor, while the level 7 uniform grid solution required 64 
processors to handle the memory requirements. 

As a test for second order convergence, we consider the 
initial data for a single black hole. Valid initial data must 
satisfy the constraint equations of general relativity. For 
a vacuum spacetime, the Hamiltonian and momentum 
constraints are given respectively by 

R + - KijK'^ =0 , (14) 

and 

Dj{K'^ ^g'^K)^{) , (15) 

where gij is the physical metric and g"^^ is its inverse. 
Also, R is the scalar curvature, Kij is the extrinsic cur- 
vature with trace K = Kl, and Dj is the covariant deriva- 
tive associated with the spatial metric. These initial 
value equations must be rewritten as a well-posed el- 
liptic boundary value problem. The standard techniques 
for rewriting the constraint equations are based on the 
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York-Lichncrowicz conformal decomposition [iSj. Fol- 
lowing this approach, we assume that the physical metric 
gij is conformally related to a background metric gij , 



5y V'^ffy 



(16) 



where ip"^ is the conformal factor. The physical extrinsic 
curvature is written as 

= ip-^IUj ■ (17) 

In terms of these conformal variables, the Hamiltonian 
and momentum constraints read 

Sy^ip ~ipR + Tp-'^ [KijK'^ - = , (18a) 
\7i{K'^ - g'^k) = -A^-^kV^iP , (18b) 

where Vi and R arc the covariant derivative and scalar 
curvature associated with the background metric gij . 

The "puncture method" [s^l is a way of specifying 
black hole initial data on Ft^. The background metric 
is chosen to be flat. The momentum constraint Ijl8bp is 
solved analytically by 

K"^ = IT^iP'n^ + P^n' - (g'^ - n'n^)P^nk) (19) 

where is the momentum of the black hole and is the 
radial normal vector (in the flat background with Carte- 
sian coordinates). Note that fC^^ is traceless, K = Q. 
This expression (|19|) for K"^^ can be generalized to in- 
clude an arbitrary number of black holes with spin and 
momentum, but for simplicity we will use a single black 
hole with no spin for our test case. To complete the spec- 
ification of initial data, we must solve the Hamiltonian 
constraint (|18a|l for the conformal factor ip. With the 
puncture method, the solution ip is split into a known 
singular term and a nonsingular term u: 



m 
2FI 



(20) 



Here, m is the "bare mass" of the black hole and is the 
coordinate distance from the origin. With the puncture 
method splitting of the conformal factor, the Hamilto- 
nian constraint becomes 



where 



1 ~ ~ 2\r\ 
f3 = -a'^K'^K,, and a = 

8 771 



(21) 



(22) 



Equation H21|l is solved for the nonsingular function u on 
with Robin boundary conditions -§;:[r{(p — 1)] = 0. 
For these tests we use a fixed mesh refinement (FMR) 
grid with an "X plus 3" {Xp3) structure. The terminol- 
ogy Xp3 means that the grid is composed of 4 refinement 
regions, with the coarsest part of the grid at level X and 
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FIG. 7: Contour plot of the non-singular part of solution u. 
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FIG. 8: Three point convergence test of puncture data. 



the finest part of the grid at level X + 3. The different 
levels are nested in a "box~in-box" fashion. The finest 
level, with resolution X + 3, extends from —2 to 2 in each 
coordinate direction. The level with resolution X + 2 cov- 
ers the domain between —4 and 4, excluding the finest 
level. The level with resolution X + 1 covers the domain 
between —8 and 8 excluding the finer levels. The coars- 
est level, with resolution X, covers the domain between 
— 16 and 16 excluding the finer levels. 

For our test case we have chosen m = 1 and = 
(0, 0, 1). We solve Eq. using the series of FMR grids 
3p3, 4p3, 5p3, 6p3, and 7p3. Each successive FMR grid 
has the same boundaries and double the resolution of 
the previous grid. Figure shows a contour plot of the 
solution u in the y-z plane, obtained with the 6p3 grid. 

Figures ISI and |51 show the results of a three-point con- 
vergence test for data along the z axis. This data 
passes through the puncture (at the origin) where the 
solution u and its derivatives are changing most rapidly. 
The three-point convergence test is obtained by plotting 
the difference between solutions on successive FMR grids, 
multiplied by an appropriate power of 4. The top (red) 



11 




z 

FIG. 9: Three point convergence test of puncture data close 
to puncture. 

curve shown in Figs. |H1 and El is the difference between 
the solution u obtained on the 4p3 grid and the solution 
obtained on the 3p3 grid. The next curve is the difference 
between solutions on FMR grids 5p3 and 4p3, multiplied 
by 4. The third curve is the difference between solutions 
on FMR grids 6p3 and 5p3, multiplied by 16. Finally, 
the curve that has the most negative value at the origin 



in Fig. is the difference between solutions on the FMR 
grids 7p3 and 6p3, multiplied by 64. One can see that the 
curves in Figs. |H1 and overlay one another in the limit of 
high resolution. This shows that the errors in AMRMG 
are second order in the grid spacing. 
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